"SZTAKI-DMS: OWLAP Analytics Beta"

VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

 

L?szl? Dud?s, MTA SZTAKI, ldudas@info.ilab.sztaki.hu

Zsolt Fekete, MTA SZTAKI, zsfekete@info.ilab.sztaki.hu

Julianna G?b?l?s-Szab?, MTA SZTAKI, gszj@info.ilab.sztaki.hu, PRIMARY

Andr?s Radnai, MTA SZTAKI, aradnai@info.ilab.sztaki.hu

?gnes Sal?nki, MTA SZTAKI, salankia@info.ilab.sztaki.hu

Adrienn Szab?, MTA SZTAKI, aszabo@info.ilab.sztaki.hu

G?bor Szucs, MTA SZTAKI, szgabbor@info.ilab.sztaki.hu


Student Team:  NO

Tool(s):

Owlap Anlytics Pro was developed by our team especially for VAST Challenge 2012.

Prezi, http://prezi.com/, used by permission of the Prezi Team.

Prezi is a cloud-based (SaaS) presentation software and storytelling tool for exploring and sharing ideas on a virtual canvas. Prezi is distinguished by its zooming user interface (ZUI), which enables users to zoom in and out of their presentation media. Prezi allows users to display and navigate through information within a 2.5D space on the Z-axis. (from Wikipedia)

 

Video:

Sztaki VAST 2012

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1  Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 

We observed two types of disorder at 14:00 on 2nd of February:

1. in some regions high ratio of machines don't work:

Altogether cc. 9 % of the machines are out of order (79761 from 888977) , which is supposed to be caused by two phenomena: in some western regions the workday had not started yet, on the other hand there are regions with suspicious characteristics.


In Datacenter-5 (in the headquarter region) a high number of servers (49240 from 51325) are out of order. Unexpectedly, all the 5 workstations that belong to this facility are functioning well at this time.


Illustration 1: 02.02.2012, 14:00, dc-5: Number of servers and workstations, colored by policy status



In Region 25 some of the facilities seem to be unavailable. In the following branches none of the computers provide us any data: branch 2, 3, 4, 8, 9, 23, 27, 30, 33, 39, 44, 47. The rest of the facilities (headquarters and all other branches under branch-50) are online and they are working according to the global trend.


Illustration 2: 02.02.2012, 14:00, region-25: geo-spatially distribution of unavailable and healthy machines



2. relatively high amount of computers are not healthy.
Approximately cc. 16.2% of all computers (144 311 machines) show moderate policy deviance and 3034 show serious deviance.
Especially worrying, in region 5 and 10 none of the machines are healthy. Note that Datacenter-5 is located on the field area of Region-5, although this two areas have distinct problems.


Illustration 3: 02.02.2012, 14:00: average policy status in regions



MC 1.2  Use your visualization tools to look at how the network?s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 1. Blackout in Headquarter's Datacenter-5.
At the beginning of the period all the servers and 2 workstations are offline. Only 3 workstations provide us any data from this area, later the 2 remaining workstations are turned on.


Illustration 4: Policy status distribution in time (Axis-X) in dc-5: high ratio of unavailable machines at the beginning



At 4:45 AM (BTM 10:45) 240 servers (of various types) turn on for 60 minutes. However, in this period no suspicious activity or traffic can be observed.
Later all machines start to work again.


Illustration 5: dc-5: 240 servers are turned on during the black-out (green pattern surrounded with blue bars)



    2. Partial Blackout in Region-25
    The blackout starts at 9:15 AM (BTM 12:15) and lasts until 01:00 AM (BTM 04:00) and it affects 36 facilities (from 50) in distinct time intervals. Once a facility is under shut-down, none of its machines provide any data. Interestingly, the black-out seems to propagate from south to north. Probably this is caused by a power-cut in the respective buildings (or hackers are attacking our system.)



    Illustration 6: Region25: blackout during the first day. Availability is characterized by facilities, this can be seen on the map (blue means NA, green means healthy)

    3. Region-5 and Region-10
    In contrast to the observable trends in the system, there are no healthy machines at all in these two regions even at the beginning of the period.


    Illustration 7: Unhealthy regions: region-5 and region-10 do not contain any healthy machines, even at the period start

    4. Unusual network traffic in Region-10, generated by tellers
    Normally tellers should have less than 5 connections in average beyond business hours, however in region-10 we observed that tellers provide unexpectedly high connection number during the nights.
    In the first night this phenomenon can be observed only in a few facilities, while in the second night all facilities show this anomalous behaviour.


    Illustration 8: Average connection number of teller workstations in region-10: unexpected network traffic during the night





    Illustration 9: Region10: average connection examined by facilities. Two types of behaviours can be observed, e.g. branch-43 and branch-44



      The exact details of the night 2nd of February are contained in the following table:

      Branch

      Starting time (local time)

      End (local time)

      Average connection number

      Branch-6

      02:15:00 AM

      05:00:00 AM

      10 -15

      Branch-24

      03:15:00 AM

      05:00:00 AM

      30 - 40

      Branch-43

      03:15:00 AM

      05:00:00 AM

      40 - 50

      Branch-57

      03:15:00 AM

      05:00:00 AM

      15- 20

      Branch-76

      04:15:00 AM

      05:00:00 AM

      10 - 15

      Branch-89

      04:15:00 AM

      05:00:00 AM

      15 - 25

      Branch-106

      04:15:00 AM

      05:00:00 AM

      30 ? 35

      In the second night all 250 facilities generate increased traffic. This event begins at 2:15:00 AM (local time) and it ends at 5:00 AM. The average connection number in this time interval is about 25.


       5. Infection characteristics

      The trend shows that more and more computers become ?sick? which means that they are not in the healthy policy state anymore. While servers and atms get infected during the 2-day period continuously and uniformly, workstations have two hops in their charts, which is caused by the fact, that people arrive to work in the morning and they turn their computers on.




      Illustration 10: Infection trend in the whole data set, separated by machine classes.

      However there is another suspicious fact that we observed: in the second morning several computers show deviance after turning on, although they were not infected (or at least less infected) on the previous day, before shutting down. This anomaly can be explained by the fact, that after turning on, most computers connect to a server. Since servers are also infected in large amount, the probability of the virus infection by propagation is also increased.




      Illustration 11: Policy status transitions according to local time: in the second morning the newly turned on workstations show higher deviation rate than they did on the previous day right before shutting down

      A Prezi presentation about the anomalies